Contour similarity and its implication on inverse prostate SBRT treatment planning

Abstract Purpose Success of auto‐segmentation is measured by the similarity between auto and manual contours that is often quantified by Dice coefficient (DC). The dosimetric impact of contour variability on inverse planning has been rarely reported. The main aim of this study is to investigate whether automatically generated organs‐at‐risk (OARs) could be used in inverse prostate stereotactic body radiation therapy (SBRT) planning and whether the dosimetric parameters are still clinically acceptable after radiation oncologists modify the OARs. Methods and materials Planning computed tomography images from 10 patients treated with SBRT for prostate cancer were selected and automatically segmented by commercially available atlas‐based software. The automatically generated OAR contours were compared with the manually drawn contours. Two volumetric modulated arc therapy (VMAT) plans, autoRec‐VMAT (where only automatically generated rectums were used in optimization) and autoAll‐VMAT (where automatically generated OARs were used in inverse optimization) were generated. Dosimetric parameters based on the manually drawn PTV and OARs were compared with the clinically approved plans. Results The DCs for the rectum contours varied from 0.55 to 0.74 with a mean value of 0.665. Differences of D 95 of the PTV between autoRec‐VMAT and manu‐VMAT plans varied from 0.03% to −2.85% with a mean value of −0.64%. Differences of D 0.03cc of manual rectum between the two plans varied from −0.86% to 9.94% with a mean value of 2.71%. D 95 of PTV between autoAll‐VMAT and manu‐VMAT plans varied from 0.28% to −2.9% with a mean value −0.83%. Differences of D 0.03cc of manual rectum between the two plans varied from −0.76% to 6.72% with a mean value of 2.62%. Conclusion Our study implies that it is possible to use unedited automatically generated OARs to perform initial inverse prostate SBRT planning. After radiation oncologists modify/approve the OARs, the plan qualities based on the manually drawn OARs are still clinically acceptable, and a re‐optimization may not be needed.


INTRODUCTION
Segmentation of medical images aims to locate anatomic structures and delineate their boundaries on a digital source such as computed tomography (CT) or magnetic resonance imaging scans, which is a crucial step in radiation treatment planning. In most cases, segmentation is done by an experienced clinical expert, and this process is often time-consuming 1 and subject to interobserver variations. 2,3 To improve efficiency and reduce the workload of clinicians, auto-segmentation has been introduced to radiation therapy and has been an active area of research for many years. [4][5][6][7][8] Traditional auto-segmentation algorithms 4-8 used in radiation therapy are mostly based on the direct analysis of image content and properties such as voxel intensities and/or image gradient,and their performance is limited by insufficient soft tissue contrast of CT data that makes it harder to accurately delineate critical organ boundaries. Such limitation motivated the search for new segmentation algorithms powered by prior-knowledge, and one of such algorithms is the atlas-based segmentation. Atlasbased segmentation applies prior knowledge by using a reference image, also referred to as atlas (image), in which structures of interest are already segmented. 9 The segmentation of corresponding structures in a new target image is obtained by finding the optimal transformation between the atlas image and the target image. Different techniques of transformation such as demons registration, 10 block-matching, 11 and B-spline 12 registration have been used in radiotherapy. [13][14][15] The choice of a suitable and robust error metric, 16 such as mutual information, is very important in handling image noise or changes of image contrast.
To speed up the contouring process and improve consistency among observers, a number of atlasbased contouring products have become commercially available recently. Most of the products use a form of atlas-based contouring and are facilitated by a model-based method, but these are generally limited to certain organs-at-risk (OARs). 17 Performance of different automatic contouring tools applied to various sites has been reported for head and neck, breast, abdomen, and lung. [17][18][19][20][21] Most recently, convolutional neural networks 22 -a concept from the field of deep learning-have been applied to auto-segment head/neck CT images. Dice coefficient (DC) has also been used to train a deep neural network to automatically segment CT images from breast cancer patients 23 for radiation therapy, and the author concluded that their algorithm has a significant impact on the workload of clinical staff and on the standardization of care. The performance of the auto-segmentation or deformable image registration algorithms were often evaluated by comparing manual contours with automatically generated contours, 13,22,[24][25][26][27][28] and the similarities between the two set of contours were quantified by mathematical quantities. Among all of the metrics used to measure the similarity between two contours, DC defined as 2 × V 1 ∩ V 2 ∕(V 1 + V 2 ) , which describes the intersection of contour volume V 1 and V 2 divided by the average volume, is one of the most popular criteria. DC has a value of 1, when two contours completely overlap with each other, and has a value of 0, when two contours are entirely disjoint (no overlap). Another popular mathematical quantity is used to measure the contour differences, Hausdorff distance (HD) ( d H ), which is defined as where A and B are the two contours, which were also calculated for selected contours.
Even though the similarity between two contours could be quantitatively measured by DC or HD, the implication of the similarity on inverse treatment planning has been rarely reported. In modern radiation therapy, intensity-modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT) are gaining popularity, especially in stereotactic body radiation therapy (SBRT). These techniques use inverse treatment planning to deliver high conformal doses to tumors while minimizing doses to critical surrounding organs. Inverse planning is substantially different from the traditional forward planning even though the two types share some common procedures. Compared with forward planning, inverse planning is nonintuitive and the iterative optimization process is like a black-box to the user. Consequently, the impact of the contour variations of critical organs, which is often quantified by DC, on the plan quality is not well understood. Therefore, the main aim of this study is to evaluate the impact of contour variability on inverse prostate SBRT planning and investigate whether unedited OARs could be used to generate plans with quality similar to the one optimized with manual OARs. In most clinics, IMRT/VMAT planning will not start until radiation oncologists approve the OARs. In this study, we want to explore whether the OARs automatically generated by commercially-available software could be used to optimize IMRT/VMAT plans before physicians' final approval is given. Our study is designed to evaluate whether after radiation oncologists edit/approve OARs, the plan qualities measured by the manually drawn OARs are still clinically acceptable.

MATERIALS AND METHODS
Planning CT images of 20 prostate patients, who underwent spaceOAR and fiducial marker placement for SBRT, were manually segmented by our experienced experts and used to build template for atlas-based segmentation algorithm that recently became available in our clinics. On these images, OARs, such as the rectum, bladder, seminal vesicle, and penile bulb as well as prostate and PTV, were manually contoured. CT images from another 10 prostate patients, who underwent spaceOAR and fiducial marker placement for SBRT, were used as target images, and all the critical organs were automatically generated by applying the atlas-based auto-segmentation algorithm. For each of the 10 prostate patients, OARs and PTV were manually drawn by an experienced radiation oncologist, and manu-VMAT plans, based on the manual contours, were optimized and reviewed by at least two physicians before its clinical use. All 10 plans were VMAT plans with 2 full arcs except for one large patient with 4 partial arcs to avoid collision. PTV, rectum, bladder, spaceOAR, seminal vesicle, and penile bulb were drawn by a physician before optimization starts.During the optimization process, 10 plans used rectum and bladder to create OAR dose constraints and 4000 cGy was prescribed to PTV except for 1 patient for which 3625 cGy was prescribed. In addition to rectum and bladder, 4 of the 10 plans used ring structures (which include all tissues 1 or 2 cm away from the PTV) in objective functions to control dose spillage to other normal tissues such as small bowl and femur head.
To evaluate the impact of contour variability on inverse planning, we first replaced manually drawn rectum with automatically generated rectum and reoptimized the plans. The plans optimized with automatically generated rectum are denoted as autoRec-VMAT. Another plan, denoted as autoAll-VMAT, was generated by replacing all the manually drawn OARs (rectum, bladder) except for the ring structures with automatically generated OARs, and the plans were re-optimized again. Both autoRec-VMAT and autoAll-VMAT plans were optimized from scratch that means the beams were reset before optimization starts. In both cases, beam arrangements are the same as original plans, and the dosimetric constraints and the weight of the objective were adjusted to achieve the best possible plans. As the re-optimization process started from scratch, the plan parameters' differences reflect the impact of OARs contours on the inverse planning process. To quantify the contour variability, DCs between the manual and auto-OARs were evaluated, and HDs between rectums were also calculated to better understand the impact of contour variability.
The dosimetric parameters based on the manually drawn contours were compared among manu-VMAT, autoRec-VMAT, and autoAll-VMAT plans. Dosimetric parameters, such as D 95 of PTV and dosimetric parameters for manually drawn rectum and bladder, were compared among the three plans. D 0.03cc and D 20 of bladder were mainly used to evaluate the plan quality by our radiation oncologist. In our clinics, D 95 of PTV F I G U R E 1 The workflow of our study scheme to evaluate the impact of Dice coefficient (DC) on inverse plan quality is generally required to be no less than 3625 cGy to achieve proper tumor control. To control the risk of complications, D 0.03cc of bladder is no more than 4000 cGy, and D 20 of bladder is expected to be less than 1830 cGy (and the secondary goal is D 20 < 3000 cGy). For rectum, D 0.03cc of anterior rectum, D 3cc of lateral rectum, and D 0.03cc of posterior rectum were used by our physicians. Generally, D 0.03cc of anterior rectum is expected to be less than 3900 cGy, D 3cc of lateral rectum is less than 2000 cGy, and D 0.03cc of posterior is expected to be less than 1900 cGy. These dosimetric goals are desired and could be adjusted by our radiation oncologists during plan review to achieve a balance between PTV coverage and OAR dose limits. It is worth emphasizing that even though automatically generated contours were used in inverse optimization process, only manually drawn contours were used to evaluate the plan qualities. Figure 1 shows the workflow of our study scheme.

DC and PTV D 95
The manu-VMAT plans. DCs of bladders are also shown in Table 2, and they vary from 0.73 to 0.91 with an average of 0.83 and standard deviation of 0.07. PTV D 95 differences vary from 0.79% to −2.9% with a mean value of −0.84%. PTV D 95 for autoAll-VMAT (plan optimized with automatically generated OARs) plans are close or slightly better than the plan optimized with manual OARs. Figure 2 compares the dose distributions between autoRec-VMAT and manu-VMAT plans. The prescribed doses are 3625 cGy (a) and 4000 cGy (b), whereas DCs for the two cases are 0.74 (a) and 0.55 (b). The isodose distributions based on auto segmented rectum (on the left of (a) and (b)) and manual rectum (on the right of (a) and (b)) are shown side-by-side for comparison. By visual inspection, the hot spots (red), coverage of prescribed isodose line (green), and the 80% of prescribed isodose line (yellow) look very similar.  Note: The differences between the two D 0.03cc s are also shown. Table 3 summarizes the D 0.03cc of anterior rectum for autoRec-VMAT and manu-VMAT plans. The percentage differences between the two D 0.03cc s are also listed. The percentage differences between the two D 0.03cc s range from −0.9% to 9.9% with an average of 2.7%. Similarly, the percentage difference is calculated via 1 − (D 0.03cc ) auto /(D 0.03cc ) manual . In most cases, D 0.03cc for the autoRec-VMAT plan is less than the D 0.03cc for the manu-VMAT plan except for Patients 4 and 10. Generally, manual rectum D 0.03cc for the autoRec-VMAT plan is close or less than the D 0.03cc of manu-VMAT plan, even though the DC varies from 0.55 to 0.74.

Dosimetric parameters of manual rectums and bladders
Other than D 0.03cc of the anterior rectum, D 3cc of lateral rectum and D 0.03cc of posterior rectum are often  used to evaluate the plan quality in our clinics. Therefore, D 3cc and D 0.03cc of lateral and posterior rectum achieved by the two plans (autoRec-VMAT and manu-VMAT) are listed in Tables 4 and 5. For all patients, D 3cc s of autoRec-VMAT plans are less than the D 3cc of manu-VMAT plans. D 0.03cc s of posterior rectums for autoRec-VMAT plans are mostly less than the D 0.03cc of posterior rectums for manu-VMAT plans except for Patients 1 and 7. Table 6 lists the D 0.03cc of anterior rectum for autoAll-VMAT and manu-VMAT plans. Tables 7 and 8 summarize D 3cc of lateral rectum and D 0.03cc of posterior rectum for the two plans. Our results show that D 0.03cc and D 3cc to manually drawn rectums from autoAll-VMAT plans were either close or better than those from manu-VMAT plans. Tables 9 and 10 compare D 0.03cc and D 20 of bladder between autoAll-VMAT and manu-VMAT plans. Figure 3 compares automatically segmented rectum (blue) and manually drawn rectum (purple) in three views for the two patients with minimum (Patient 9 (a)) and maximum (Patient 1 (b)) DC. The two rec-  tums shown in Figure 3a are visually more conformal with each other; however, the two rectums in Figure 3b show larger disparities in posterior and lateral regions. Figure 4 compares the DVHs of PTV (blue) and manually drawn rectums (purple) for the plans optimized with automatically segmented rectum (dashed lines) and manually drawn rectum (solid lines), respectively. Similar to Figure 3, Figure 4a compares DVHs for the patient with maximum DC (Patient 1) and Figure 4b shows the DVHs for the patient with minimum DC (Patient 9). Generally, the DVHs are very close to each other and rectum doses are lower for the autoRec-VMAT plan than the manu-VMAT plan.

DISCUSSION
The promise of tremendously reducing the cost and improving efficiency in radiation therapy motivated the search and evaluation of automatic treatment planning algorithms 29,30 ; however, auto-segmentation has been a bottleneck in automatic treatment planning.  Consequently, the development of fully automatic segmentation algorithms for radiation therapy has been an interest area of research for many years. 6,14,15,22,27 Due to low contrast of soft tissue and image artifacts, the  performance of the algorithms is still suboptimal, which prevented the widespread use of the auto segmentation tools in clinics. Most of the researchers use contour similarity, which is quantitatively measured by DC between  automatically segmented contours and manual contours, to evaluate the performance of the algorithms. However, contour similarity may not fully reveal the impact of contour variability on radiation therapy that, very likely, depends on the specific applications. The implication of contour similarity to a specific application in radiation therapy should be investigated case by case. Therefore, this study intends to evaluate the impact of contour similarity on the applicability of unedited OAR in inverse prostate SBRT radiation planning. SBRT is a radiation treatment modality that combines a high degree of targeting accuracy and reproducibility with very high doses of extremely precise in short courses. Some clinical studies 31,32 have indicated that prostate SBRT has shown no difference in late toxicity, patient reported quality of life, or tumor control when compared with conventional external beam radiation therapy (78 Gy in 39 fractions). These studies suggest that prostate SBRT is a safe and effective treatment modality for low-to-intermediate risk prostate cancer. Inverse treatment planning techniques, such as IMRT and VMAT, have been a dominant tool in generating highly conformal plans for SBRTs that maximize doses to the target while sparing critical organs. Due to its nonintuitive nature of inverse planning process, the correlation between contour and the planning process is not straightforward and needs careful study to better understand the impact of contour similarity on plan quality. Our study shows that even though DC, which is often used to quantitatively measure contour similarity, can vary from 0.55 to 0.74; these contour variabilities do not prevent us from using the automatically generated OARs to generate clinically accepted plans as shown in Tables 1-10. This information could be very helpful for us to understand how to use the auto-segmentation tools effectively in radiation therapy. Lustberg et al. 33 evaluated atlasbased and deep learning contouring for lung cancer and concluded that total median time saved was 7.8 min for atlas-based contouring and 10 min for deep learning contouring. They conclude that user adjustment of software generated contours is a viable strategy to reduce contouring time of OARs for lung radiotherapy. Our results imply that the amount of work of adjusting automatically generated contours could possibly be further reduced. In our clinics, planning will not start until radiation oncologists approve all the OARs and our results imply that planning could start with the automatically generated OARs and the final plan may not need to be re-optimized after radiation oncologists modify the OARs. This could help us optimize planning workflow and further improve efficiency in clinics.
Intra-observer and interobserver variabilities (IOV) in contouring have been well studied and reported 34,35 in literature, and a framework to use future studies evaluating IOV is recommended. 35 Most auto-segmentation algorithms measure the performance of the algorithm by comparing auto-generated contours with expert segmentations 25,27 and the impact of the inherent contour variability on its application in radiation therapy should be well understood before its use in clinics. Our results show that when used for inverse prostate SBRT treatment planning, the dosimetric impact of OARs contour variability on the plan quality is limited, and this information could be very helpful in developing automatic adaptive radiation therapy scheme.

CONCLUSION
This study implies that contour similarity alone might not be a good indicator of the usefulness of the automatically generated contours in radiation therapy; therefore, it should not be the only metrics to evaluate the performance of the auto-segmentation algorithms.
In addition, contour variations might be inherent and could be unavoidable; hence, better understanding of the impact of contour variation on radiation therapy could provide some guidance for us to further improve the auto-segmentation algorithms. Our study implies that unedited OAR contours could be used for inverse prostate SBRT planning to generate clinically acceptable plans when evaluated according to the manual contours. In our clinical practice, OARs need to be approved by radiation oncologist before planning starts.
Our study implies that inverse planning could start before OARs are approved and the final plan quality characterized by the manually drawn OARs could still be clinically acceptable. This information could be helpful for us to optimize our planning workflow and further improve efficiency in clinics.

AU T H O R C O N T R I B U T I O N S
The corresponding author Chenyu Yan was responsible for the design of the study. Bingqi Guo helped explain the atlas-based segmentation algorithm and the critical review of the article. Rahul Tendulkar contributed to all the patients' data used in this study. Ping Xia provided valuable guidance and recommendations for the study.

AC K N OW L E D G M E N T
All the authors contributed to this study and the authors are grateful to Cleveland Clinic Foundation where the data were collected and analyzed.

C O N F L I C T O F I N T E R E S T
The authors declare that they have no conflict of Interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.